Project-Team:STARS

Inria | Raweb 2016 | Presentation of the Project-Team STARS | STARS Web Site


	PDF	e-Pub

Previous |

Home | Next next

Section: New Results

Pedestrian Detection on Crossroads

Participants : Ujwal Ujwal, François Brémond.

Pedestrian detection has a specific relevance in the space of object detection problems in computer vision. Due to increasing role of automated surveillance systems in increasing areas, demands for a highly robust and accurate pedestrian detection system is increasing day after day. Recently, deep learning has emerged as an important paradigm to tackle complex object detection problems. This year, we performed our initial studies on pedestrian detection using deep learning techniques. These studies form an important basis for us to extend our work in the future.

Evaluation Metrics

The relative comparison of different pedestrian detection systems was done using evaluation metrics. In the area of pedestrian detection, the most widely used evaluation metric is that of miss rate(MR). Miss rate is related to the concept of recall, which is another very commonly used metric in computer vision, especially in problems related to retrieval of images and concepts. Miss Rate is defined as follows:

M i s s R a t e = \frac{F a l s e N e g a t i v e s}{T r u e P o s i t i v e s + F a l s e N e g a t i v e s}

(1)

Figure 10. True and False Positives in pedestrian detection

In equation 1, True Positives(TP) and False Negatives(FN) can be understood from figure 10. A good pedestrian detector should not miss many people in a scene and this aspect is reflected in the definition of equation 1. A good pedestrian detector is required to detect as few False Positives(FP) as possible. This is expressed in the literature usually in the form of False Positives Per Image(FPPI). FPPI is basically a per-image average of total number of FP detections.

Pedestrian detection systems usually work with a number of parameters. Different values of these parameters may tune a system to different MR and FPPI value. This is usually expressed in the form of a Precision-recall(PR) curve. This curve is created by varying a control parameter of a system and plotting MR and FPPI values. In literature it is customary to report MR value at 0.1 FPPI.

Experiments

We considered deep learning based models for our initial set of experiments. This is primarily owing to their popularity and the promise which they have demonstrated in the area of object detection over the past several years.

There are many deep learning based models which have been used for object detection. The purpose of these experiments was to gain a deeper insight into the performance of deep neural networks for pedestrian detection. We experimented with Faster-RCNN [88] and SSD detector [78]. These were chosen owing to the fact that they are recent models (2015 for Faster-RCNN and 2016 for SSD Detector), and have displayed state-of-art performance in terms of detection speeds and accuracy across many object categories.

The results shown in table7 were obtained by fine-tuning VGG-16 with imagenet and MS-COCO datasets which did not involve any public dataset specific to pedestrian detection. Hence, we took the fine-tuned model and further fine-tuned it with different pedestrian datasets to study the effectiveness of fine-tuning with pedestrian-specific datasets.

Each row in the first column of table8, reflects the dataset(s) which were used to fine-tune the model. For each row, the model was fine-tuned using the dataset indicated in its first column, as well as the datasets indicated in the first column of all rows preceding it. The model was then evaluated against the test set of each dataset and the miss-rates are indicated in the table.

**Table 7.** Performance of fine-tuned Faster RCNN on pedestrian detection datasets.Numbers indicate the miss-rate.
Performance of fine-tuned Faster RCNN
Dataset	Faster RCNN Performance	State of Art
Inria	13.47%	13%
Daimler	37.7%	29%
ETH-Zurich	32.1%
Caltech	26.7%	19%
TUD-Brussels	52.2%	45%

**Table 8.** Faster-RCNN performance after fine-tuning with pedestrian datasets. Numbers indicate the miss-rate.
	Image datasets
Trained Model	Inria	Daimler	TUD-Brussels	ETH-Zurich	Caltech
+Inria	13.4%	36.9%	52%	32.1%	28.2%
+Daimler	13.6%	33.7%	51.1%	32.7%	29.1%
+ETH-Zurich	13.8%	34.6%	49.3%	32%	26%
+Caltech	16%	35.4%	48%	33.2%	25.2%

While the initial results as seen from table 7 are encouraging, they still need a lot of improvement especially with complex datasets such as TUD-Brussels and Caltech. We also see from table8, that fine-tuning with pedestrian datasets tends to improve the performance but the magnitude of improvement varies depending upon the dataset(s) being fine-tuned with and the dataset(s) being tested upon. These observations indicate some important research directions. Data in computer vision applications are highly varied and it is not very easy to capture its complexity and variations with sufficient ease. It is important to proceed to work on better dataset usage by clustering the datasets together based on traits such as viewpoint, resolution etc. Resolution is another important element which significantly affects deep learning based approaches. This is because deep learning involves automated feature extractions from the pixel level and low resolution appearance often makes that problem difficult.

We intend to work upon and cover these issues in subsequent efforts towards solving the pedestrian detection problem.

Previous |

Home | Next next